Twittter Github Email

Course materials for 2020-11-2 AFEC at XTBG.

1 Prerequisites

install.packages("spdep")
library(tidyverse)
library(spdep)

2 Spatial autocorrelaiton

  • Neighbors affect the observation.

  • Let’s condier maps with some environmental variables (e.g., soil N, soil moisture…).
library(spdep)

set.seed(1234)
N <- 50
x <- seq(1, 25)
y <- seq(1, 25)
#z <- matrix(numeric(length(x) * length(y)), ncol = length(y))
z <- matrix(rnorm(length(x) * length(y)), ncol = length(y))

z0 <- z
z2 <- z
for (i in 2:(length(x)-1)) {
  for (j in 2:(length(y)-1)) {
    z[i, j] <- 
      rnorm(1, 
            mean(
              c(z[i - 1, j - 1],
              z[i - 1, j],
              z[i - 1, j + 1],
              z[i, j - 1],
              z[i, j + 1],
              z[i + 1, j - 1],
              z[i + 1, j],
              z[i + 1, j + 1])),
            0)
  }
}


z_scaled <- (z - mean(as.numeric(z))) / sd(as.numeric(z))

z2 <- z
z2[z < 0] <- "ridge"
z2[z > 0] <- "valley"
#z2[z > -0.5 & z < 0.5] <- "flat"

#z2 <- z2 + z
x2 <- rep(x, length(y))
y2 <- rep(y, each = length(x))
dat <- tibble(x = x2,
              y = y2,
              mu = as.vector(z),
              hab = as.vector(z2),
              z = as.numeric(z_scaled),
              z0 = as.numeric(z0)
)

ggplot(dat, aes(x = x, y = y, fill = z0)) +
  geom_raster()

  • Soil moisture (z0) are randomly distributed.
dat %>% 
  # cut the edges
  filter(x > 1) %>%
  filter(x < max(x)) %>%
  filter(y > 1) %>%
  filter(y < max(y)) %>%
  ggplot(., aes(x = x, y = y, fill = z)) +
    geom_raster()

  • Soil moisture (z) can show aggregated patterns.
ggplot(dat, aes(x = x, y = y, fill = hab)) +
  geom_raster()

  • Ridge sites have lower soil moisture
dat2 <- dat %>%
  mutate(hab_dummy = ifelse(hab == "valley", 0, 1)) %>%
  mutate(trait = rnorm(nrow(.), -mu, 0.3)) # based on z 
  #mutate(trait = rnorm(nrow(.), 2 * hab_dummy, 0.6)) # based on z 

dat2 %>%
  dplyr::select(x, y, hab, trait) %>%
  write_csv(., "./data/trait_env.csv")

dat2 %>%
  ggplot(., aes(x = hab, y = trait, col = hab)) +
  geom_violin() +
  geom_jitter(width = 0.2)

  • Points indicate mean trait values for each grid (local community).
  • Ridge sites have greater trait values (e.g., thick dense leaves to grow well in dry conditions)
  • Is this because those traits are favored in ridge sites?
  • Is this just because neighbors have similar trait values (spatial autocorrelation)?

3 Spatial autoregressive models (SAR)

\[ \left[ \begin{array}{r} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \end{array} \right] = \rho \left[ \begin{array}{rrrrrr} 0 & 1/3 & 0 & 1/3 & 1/3 & 0 \\ 1/3 & 0 &&&& \\ 0 && 0 &&& \\ 1/3 & & & 0 && \\ 1/3 & & & & 0 & \\ 0 & & & & & 0 \end{array} \right] \cdot \left[ \begin{array}{r} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \end{array} \right] + \epsilon \]

\[ y_1 = \rho 1/3 (y_2 + y_4 + y_5) + \epsilon \]

In general, autoregressive models take this kind of form.

\[ Y = X \beta + \rho W Y + \epsilon \]

  • \(Y\): Dependent variable (\(N \times 1\), e.g., CWM)
  • \(X\): Independent variable (\(N \times k\), e.g., ridge (1) or valley (0))
  • \(\beta\): Model parameters (\(k \times l\))
  • \(\rho\): Spatial autocorrelaiton parameter
  • \(W\): Spatial weight matrix (\(N \times N\))
  • \(\epsilon\): errors (\(N \times 1\))

Parameters, \(\beta\), \(\rho\), (variance of )\(\epsilon\), can be esimated by maximum likelihood or Bayesian methods.

4 R examples

  • Let’s apply spatial autocorrelaiton models for the above violin plot example.
  • First, we need to prepare a spatial weight matrix (W).
  • Then, we apply some R functions to fit \(Y = X \beta + \rho W Y + \epsilon\)

4.1 Spatial weight matrix

  • Read the dataset.
d <- read_csv("./data/trait_env.csv")
d
## # A tibble: 625 x 4
##        x     y hab      trait
##    <dbl> <dbl> <chr>    <dbl>
##  1     1     1 ridge   1.72  
##  2     2     1 valley -0.542 
##  3     3     1 valley -1.20  
##  4     4     1 ridge   2.64  
##  5     5     1 valley -0.567 
##  6     6     1 valley -0.0680
##  7     7     1 ridge   0.888 
##  8     8     1 ridge   0.0489
##  9     9     1 ridge   0.692 
## 10    10     1 ridge   0.464 
## # … with 615 more rows
  • Make a neighour list object to calcluate \(W\).
  • \(y_1\) is next to \(y_2\), \(y_4\), and \(y_5\).
(nb <- cell2nb(length(unique(d$x)),length(unique(d$y)), type = "queen"))
## Neighbour list object:
## Number of regions: 625 
## Number of nonzero links: 4704 
## Percentage nonzero weights: 1.204224 
## Average number of links: 7.5264
  • Make a spatial weight object (\(W\) but list in R)
(W <- nb2listw(nb, zero.policy = TRUE, style = "W"))
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 625 
## Number of nonzero links: 4704 
## Percentage nonzero weights: 1.204224 
## Average number of links: 7.5264 
## 
## Weights style: W 
## Weights constants summary:
##     n     nn  S0       S1       S2
## W 625 390625 625 169.8933 2510.443

4.2 Fit

  • Intercept only
fit1 <- spautolm(trait ~ 1, listw = W, data = d)

summary(fit1)
## 
## Call: spatialreg::spautolm(formula = formula, data = data, listw = listw, 
##     na.action = na.action, family = family, method = method, 
##     verbose = verbose, trs = trs, interval = interval, zero.policy = zero.policy, 
##     tol.solve = tol.solve, llprof = llprof, control = control)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.453519 -0.278259 -0.012225  0.315492  2.790838 
## 
## Coefficients: 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.0081439  0.0436434  0.1866    0.852
## 
## Lambda: 0.50479 LR test value: 77.169 p-value: < 2.22e-16 
## Numerical Hessian standard error of lambda: 0.051853 
## 
## Log likelihood: -515.0481 
## ML residual variance (sigma squared): 0.29194, (sigma: 0.54031)
## Number of observations: 625 
## Number of parameters estimated: 3 
## AIC: 1036.1
  • lambda (rho) = 0.5047927 indicates a positve spatial autocorrelaiton

  • Model with an environmetal predictor

fit2 <- spautolm(trait ~ hab, listw = W, data = d)

summary(fit2)
## 
## Call: spatialreg::spautolm(formula = formula, data = data, listw = listw, 
##     na.action = na.action, family = family, method = method, 
##     verbose = verbose, trs = trs, interval = interval, zero.policy = zero.policy, 
##     tol.solve = tol.solve, llprof = llprof, control = control)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.1115109 -0.2799992  0.0034629  0.2823200  2.5030493 
## 
## Coefficients: 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  0.359795   0.027954  12.871 < 2.2e-16
## habvalley   -0.710399   0.038084 -18.654 < 2.2e-16
## 
## Lambda: 0.11012 LR test value: 2.0986 p-value: 0.14744 
## Numerical Hessian standard error of lambda: 0.075134 
## 
## Log likelihood: -401.3136 
## ML residual variance (sigma squared): 0.21111, (sigma: 0.45947)
## Number of observations: 625 
## Number of parameters estimated: 4 
## AIC: 810.63
  • lambda (rho) = 0.1101209 indicates a weak positve spatial autocorrelaiton
  • Negative habvalley indicates valley sites have negative effects on trait values compared to ridge sites even after controlling spatial autocorrelaiton.

4.3 Exercise

  • Make a raster plot for \(\epsilon\).
    • fit1$fit$residuals is a vector of \(\epsilon\).
    • What does this figure mean?
  • Make a raster plot for \(\rho W Y\).
    • fit1$fit$fitted.values is a vector of \(\rho W Y\).
    • What does this figure mean?
  • Make violin plots or box plots for \(\epsilon\) and \(\rho W Y\).
    • geom_violin, geom_boxplot
    • What do those figures mean?